从专用人工智能到通用大语言模型

人工智能范式的转变

1. 从特定到通用

人工智能领域在模型的训练与部署方式上经历了一场巨大变革。

旧范式（任务特定训练）：像早期的CNN或BERT这样的模型仅针对单一目标进行训练（例如，仅用于情感分析）。要实现翻译、摘要等功能，必须使用不同的模型。
新范式（集中式预训练 + 提示）：一个庞大的模型（大语言模型）从互联网规模的数据集中学习通用世界知识。通过改变输入提示，即可引导其完成几乎任何语言任务。

2. 架构演进

仅编码器（BERT时代）：专注于理解与分类。这些模型双向读取文本以把握深层语境，但并不具备生成新文本的能力。
仅解码器（GPT/Llama时代）：生成式人工智能的现代标准。这些模型采用自回归建模预测下一个词，非常适合开放式生成和对话场景。

3. 变革的关键驱动力

自监督学习：利用海量未标注的互联网数据进行训练，消除了人工标注带来的瓶颈。
扩展定律：经验观察表明，人工智能性能会随着模型规模（参数量）、数据量和计算能力的增加而可预测地提升。

关键洞察

人工智能已从“任务特定工具”演变为具备推理和上下文学习等涌现能力的“通用智能体”。

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

Question 1

What is the primary difference between the "Old Paradigm" and the "New Paradigm" of AI?

Moving from cloud computing to local processing.

Moving from task-specific training to centralized pre-training with prompting.

Moving from Python to C++ for model development.

Moving from Decoder-only to Encoder-only architectures.

Question 2

According to Scaling Laws, what three factors fundamentally link to model performance?

Internet speed, RAM size, and CPU cores.

Human annotators, code efficiency, and server location.

Model size (parameters), data volume (tokens), and total computation.

Prompt length, temperature setting, and top-k value.

Challenge: Evaluating Architectural Fitness

Apply your knowledge of model architectures to real-world scenarios.

You are an AI architect tasked with selecting the right foundational approach for two different projects. You must choose between an Encoder-only (like BERT) or a Decoder-only (like GPT) architecture.

Task 1

You are building a system that only needs to classify incoming emails as "Spam" or "Not Spam" based on the entire context of the message. Which architecture is more efficient for this narrow task?

Solution: Encoder-only (e.g., BERT)

Because the task is classification and requires deep, bidirectional understanding of the text without needing to generate new text, an Encoder-only model is highly efficient and appropriate.

Task 2

You are building a creative writing assistant that helps authors brainstorm ideas and write the next paragraph of their story. Which architecture is the modern standard for this?

Solution: Decoder-only (e.g., GPT/Llama)

This task requires open-ended text generation. Decoder-only models are designed specifically for auto-regressive next-token prediction, making them the standard for generative AI applications.